cpp score
Distribution-free root cause analysis
We study distribution-free root cause analysis in multi-stream data, where an evolving underlying system is observed through multiple data streams that may each undergo distributional changes at unknown timepoints. In such settings, the stream exhibiting the earliest change provides a natural starting point for investigating the underlying cause, which we refer to as the root-cause index. Leveraging conformal $p$-values, we propose a novel framework, Conformal Root Cause Analysis (CROC), which constructs finite-sample valid confidence sets for the root-cause index under minimal assumptions: the data streams are independent, and within each stream the pre- and post-change observations are sampled exchangeably from arbitrary and unknown distributions. We further establish a universality property, showing that any distribution-free method for root cause localization can be represented within the CROC framework. In addition, under mild regularity conditions and principled score design, our method yields asymptotically sharp confidence sets that efficiently isolate the root cause. We further extend CROC to efficiently handle cross-stream dependence when present. Extensive simulations demonstrate accurate localization of the root stream, supporting our theoretical guarantees.
Conformal changepoint localization
We study the problem of offline changepoint localization in a distribution-free setting. One observes a vector of data with a single changepoint, assuming that the data before and after the changepoint are iid (or more generally exchangeable) from arbitrary and unknown distributions. The goal is to produce a finite-sample confidence set for the index at which the change occurs without making any other assumptions. Existing methods often rely on parametric assumptions, tail conditions, or asymptotic approximations, or only produce point estimates. In contrast, our distribution-free algorithm, CONformal CHangepoint localization (CONCH), only leverages exchangeability arguments to construct confidence sets with finite sample coverage. By proving a conformal Neyman-Pearson lemma, we derive principled score functions that yield informative (small) sets. Moreover, with such score functions, the normalized length of the confidence set shrinks to zero under weak assumptions. We also establish a universality result showing that any distribution-free changepoint localization method must be an instance of CONCH. Experiments suggest that CONCH delivers precise confidence sets even in challenging settings involving images or text.